An Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models

نویسندگان

  • Minh Le Nguyen
  • Huong Thao Nguyen
  • Phuong-Thai Nguyen
  • Tu Bao Ho
  • Akira Shimazu
چکیده

This paper presents an empirical work for Vietnamese NP chunking task. We show how to build an annotation corpus of NP chunking and how discriminative sequence models are trained using the corpus. Experiment results using 5 fold cross validation test show that discriminative sequence learning are well suitable for Vietnamese chunking. In addition, by empirical experiments we show that the part of speech information contribute significantly to the performance of there learning models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. W...

متن کامل

Jointly Labeling Multiple Sequences: A Factorial HMM Approach

We present new statistical models for jointly labeling multiple sequences and apply them to the combined task of partof-speech tagging and noun phrase chunking. The model is based on the Factorial Hidden Markov Model (FHMM) with distributed hidden states representing partof-speech and noun phrase sequences. We demonstrate that this joint labeling approach, by enabling information sharing betwee...

متن کامل

Weak Semi-Markov CRFs for NP Chunking in Informal Text

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. W...

متن کامل

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We giv...

متن کامل

Hidden Dynamic Probabilistic Models for Labeling Sequence Data

We propose a new discriminative framework, namely Hidden Dynamic Conditional Random Fields (HDCRFs), for building probabilistic models which can capture both internal and external class dynamics to label sequence data. We introduce a small number of hidden state variables to model the sub-structure of a observation sequence and learn dynamics between different class labels. An HDCRF offers seve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009